CALText: Contextual Attention Localization for Offline Handwritten Text

نویسندگان

چکیده

Recognition of Arabic-like scripts such as Persian and Urdu is more challenging than Latin-based scripts. This due to the presence a two-dimensional structure, context-dependent character shapes, spaces overlaps, placement diacritics. We present an attention based encoder-decoder model that learns read handwritten text in context. A novel localization penalty introduced encourage attend only one location at time when recognizing next character. In addition, we comprehensively refine complete publicly available dataset terms ground-truth annotations. evaluate on both Arabic datasets. For Urdu, contextual achieves $$82.06\%$$ recognition rate $$51.97\%$$ word which represent $$2\times $$ improvement over existing bi-directional LSTM models. Arabic, outperforms multi-directional models with $$77.47\%$$ $$37.66\%$$ without performing any slant or skew correction. Code pre-trained for this work are https://github.com/nazar-khan/CALText .

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments in Unconstrained Offline Handwritten Text Recognition

A system for off-line handwritten text recognition is presented. It is characterized by a segmentation-free approach, i.e. whole lines of text are processed by the recognition module. The methods used for pre-processing, feature extraction, and statistical modelling are described, and several experiments on writer-independent, multiple writer, and single writer handwriting recognition tasks are...

متن کامل

HIT-MW Dataset for Offline Chinese Handwritten Text Recognition

A Chinese handwritten text dataset, HIT-MW, is presented to facilitate the offline Chinese handwritten text recognition. Texts for handcopying are sampled from China Daily corpus with a stratified random manner. To collect naturally written handwriting, forms are distributed by postal mail or middleman instead of face to face. The current version of HIT-MW includes 853 forms and 186,444 charact...

متن کامل

Rejection strategies for offline handwritten text line recognition

This paper investigates rejection strategies for unconstrained offline handwritten text line recognition. The rejection strategies depend on various confidence measures that are based on alternative word sequences. The alternative word sequences are derived from specific integration of a statistical language model in the hidden Markov model based recognition system. Extensive experiments on the...

متن کامل

Ensemble methods for offline handwritten text line recognition

This thesis investigates ensemble methods for offline recognition of English handwritten text lines. Multiple recognisers are automatically generated from a single base recognition system. Combining the output of these multiple recognisers provides the final ensemble result. The underlying recognisers are based on hidden Markov models. One model is built for each character. Based on the lexicon...

متن کامل

Offline Recognition of Large Vocabulary Cursive Handwritten Text

This paper presents a system for the offline recognition of cursive handwritten lines of text. The system is based on continuous density HMMs and Statistical Language Models. The system recognizes data produced by a single writer. No a-priori knowledge is used about the content of the text to be recognized. Changes in the experimental setup with respect to the recognition of single words are hi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neural Processing Letters

سال: 2023

ISSN: ['1573-773X', '1370-4621']

DOI: https://doi.org/10.1007/s11063-023-11258-5